Although food is the basic human right, many people in the world are living in highly food insecure condition. According to World Bank, with the pandemic hit, domestic food price inflation increased at least by 5 percent globally. Also, according to the United Nations news, 828 million people were affected by hunger in 2021. It’s important to analyse what factors are affecting food security and take measures to reduce world’s hunger. This one of the reasons that we were interested to study food security in USA. Although, there were limitations to our study. Food security is very complex issue, and needs multi-angle, thorough research. We’ve chosen only certain socio-economic variables out of 507 variables.
We are using the Current Population Survey - Food Security Supplement Dec 2021 data provided by the US Census Bureau
The Dataset contains 507 variables and roughly 120,000 observations
Specific: To study the specific pattern shown in the data that affects food security such as states, counties, income level, whether the family uses SNAP, race, immigrant status, work status, education level and many more demographic, socio-economic variables.
Measurable: Use EDA techniques to know how significantly different factors contribute to food insecurity.
Achievable: Can find variables which are significantly affecting food insecurity and can create models for ensuring food security in households.
Relevant: Food being the basic requirement of any human, this study can shed light on what the authorities and we ourselves can do in order to eradicate food insecurity.
Time-oriented: Data set for the month of December 2021 is considered for the study so that it can also show the effect of Covid-19 in food security.
Considering the Questions we are asking, we have decided to select just 11 factors to work on
A very significant limitation to our data is that we have trimmed off a lot of observations where either the interview was not taken or not completed. Ideally we should account for these observations somehow, but due to time constraints we aren’t doing that
## 'data.frame': 71472 obs. of 12 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
## $ States : Factor w/ 51 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Family_Size : Factor w/ 14 levels "1","2","3","4",..: 1 2 2 2 2 2 2 2 2 1 ...
## $ Household_Income : Factor w/ 16 levels "1","2","3","4",..: 16 14 14 12 12 13 13 9 9 11 ...
## $ SNAP : Factor w/ 5 levels "-3","-2","-1",..: 3 3 3 5 5 3 3 5 5 3 ...
## $ Ethnicity : Factor w/ 24 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 1 ...
## $ Citizenship_status: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Number_of_Jobs : Factor w/ 4 levels "-1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : Factor w/ 88 levels "-4","-1","0",..: 67 43 2 43 43 62 43 2 2 2 ...
## $ Education_Level : Factor w/ 17 levels "-1","31","32",..: 14 15 1 14 14 10 10 7 10 5 ...
## $ FoodSecurity_score: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 2 2 1 ...
## $ PRNMCHLD : Factor w/ 12 levels "0","1","2","3",..: 1 2 1 1 1 1 1 1 1 1 ...
High Food Security: No reported indications of food-access problems or limitations.
Marginal Food Security: One or two reported signs, usually anxiety over food availability or scarcity in the home. There is little to no evidence that diets or food intake have changed.
Low Food Security: One or two reported signs, usually indicating worry about food scarcity or insufficiency at home. Little to no evidence of dietary or food intake changes.
Very Low Food Security: Reports of numerous signs of altered eating habits and decreased food intake.
We are going to be Using Fisher’s Exact Test instead of Chi-square test because of the numerous levels with low frequency of observations
Our Null Hypothesis is that Ethnicity and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 2000 replicates)
##
## data: FS_Subset$Ethnicity and FS_Subset$FoodSecurity_score
## p-value = 0.0004998
## alternative hypothesis: two.sided
*Since, the P-Value is less than our taken alpha we can say that there is a statistically significant relationship between Ethnicity and Food Security
We are Chi-square test
Our Null Hypothesis is that Citizenship Status and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Pearson's Chi-squared test
##
## data: FS_Subset$Citizenship_status and FS_Subset$FoodSecurity_score
## X-squared = 437.62, df = 12, p-value < 2.2e-16
We are Chi-square test
Our Null Hypothesis is that SNAP Status and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Pearson's Chi-squared test
##
## data: chi_test_SNAP
## X-squared = 764.1, df = 3, p-value < 2.2e-16
## Outcome + Outcome - Total Inc risk * Odds
## Exposed + 4471 2737 7208 62.0 1.63
## Exposed - 14258 4008 18266 78.1 3.56
## Total 18729 6745 25474 73.5 2.78
##
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio 0.79 (0.78, 0.81)
## Odds ratio 0.46 (0.43, 0.49)
## Attrib risk in the exposed * -16.03 (-17.30, -14.76)
## Attrib fraction in the exposed (%) -25.84 (-28.34, -23.40)
## Attrib risk in the population * -4.54 (-5.34, -3.73)
## Attrib fraction in the population (%) -6.17 (-6.68, -5.66)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 682.162 Pr>chi2 = <0.001
## Fisher exact test that OR = 1: Pr>chi2 = <0.001
## Wald confidence limits
## CI: confidence interval
## * Outcomes per 100 population units
Now we conduct an in-depth EDA on the Number of Jobs of the respondents
The above graph shows a result where majority of the respondents doesn’t fall under the category of eligible to answer this question. And hence their response is marked as Not Applicable.
This hides the proper analysis of the responses. Hence we remove the Not Applicable responses from the variable and move forward with the study.
The above graphs clearly gives us an understanding about how many people in different categories of the variable have different food security scores.
Though it is evident that the majority of the responses irrespective of the number of jobs are saying they are Highly Food Secure, people who have 4 or more jobs have a very evident number of food insecure people. But still whether they have any dependency or not needs to be analyzed using proper statistical tests. For this we bring in the Chi- square test.
##
## Pearson's Chi-squared test
##
## data: contable_number_of_jobs
## X-squared = 8.4874, df = 6, p-value = 0.2045
The result gave warnings as the value for some cells in contigency table are very low. From the test, we see that the P-value for the Chi-square test is 0.3871 which is greater than the default value 0.05. Hence we accept the null hypothesis and hence, Number of Jobs doesn’t significantly affect the Food Security.
The test showed that, though there is variance in the proportion in graph, there is no dependency of these two vaiables.
The level of education is very important. We always assume that education is a powerful tool to eradicate poverty and hunger. With the assumption that this is true in case for food security, we shall dive into the EDA analysis of the variable. The variable had many responses.
The initial setup was done on the data by removing the “Not Applicable” responses from the data as per the instruction from the technical data.
Since the responses were in many categories, it is good to look into their frequency table.
The frequency table gave us an outline about the sample we are using. Most of the people in the survey are educated to High School or above. This is very promising about the overall growth of the society and a hope for better future.
Now even though the frequency table showed us with a majority educated sample, the food security score of the people still need to be studied. The initial tool that we can use for this study is graphical representation.
These three graphs gives us a perfect picture of the sample in this variable. Initial graph shows the fact that major proportion of the population belongs to a category of High school education or more. The next two graphs shows us the division of the category in terms of food security score. Though major ratio goes for high food security, we can clearly see a picture of decreasing low food security scored people as the education level increase.
But this hypothesis needs to be supported by appropriate statistical tests. Since the data variables under study are both catgorical, we use Chi-Square test to test the independence of the variables.
The Chi-Square test results are as follows:
##
## Pearson's Chi-squared test
##
## data: contable_edu
## X-squared = 3136.8, df = 48, p-value < 2.2e-16
The test results showed that p-value is less than 0.05, the default
alpha value and hence we can say that Education_Level is
having a significant effect on the Food Security of People. This is such
an important factor to keep in mind as giving people more education can
actually act as a solution gives us with a better hope. Majority of the
population being educated or will be educated makes this variable as
important as anything.
The Hours on Job for a week vary from people to people. How they food security score had an effect on hours on job is studied in the following EDA analysis.
There were few people who gave response as time varies and they don’t have a perfect time period, gave responses in the study. But inorder to meet the hypothesis of the test and easy analysis, those responses were removed. Removing those responses made the variable continuous and the acceptable range of responses were 0 to 99.
## [1] 2
The above results shows that the range of responses were between 1 and 88. Though the average hours of work is around 20 Hours, it is surprising to learn that majority of the responses said they work for 2 hours. This needs to be further verfied and studied as this trend can’t be encouraged. But also the possible reason why the mode came to be 2 Hours per week can also be because, the respondents were students. Further study needs to be extended on this.
The above graphs clearly depicts the distribution of data on this variable.
The following is a bargraph for the same variable.
From the bargraph, we can see that, though the boxes lie around the same region, few of the classes have very evident difference. This needs to be checked. Whether food security score affects the hours on job can be studied using ANOVA test and our hypothesis is verified by this statistical test. Further on the Post-Hoc test verifies which groups have differences.
## Df Sum Sq Mean Sq F value Pr(>F)
## FoodSecurity_score 3 279623 93208 214.6 <2e-16 ***
## Residuals 71468 31045414 434
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hours_on_Jobs ~ FoodSecurity_score, data = food_hoj)
##
## $FoodSecurity_score
## diff lwr upr
## Marginal Food Security-High Food Security -4.2140290 -4.972646 -3.4554123
## Low Food Security-High Food Security -5.6863555 -6.488530 -4.8841810
## Very Low Food Security-High Food Security -6.0702228 -7.206149 -4.9342962
## Low Food Security-Marginal Food Security -1.4723264 -2.531399 -0.4132541
## Very Low Food Security-Marginal Food Security -1.8561937 -3.186036 -0.5263518
## Very Low Food Security-Low Food Security -0.3838673 -1.739029 0.9712947
## p adj
## Marginal Food Security-High Food Security 0.0000000
## Low Food Security-High Food Security 0.0000000
## Very Low Food Security-High Food Security 0.0000000
## Low Food Security-Marginal Food Security 0.0020125
## Very Low Food Security-Marginal Food Security 0.0019070
## Very Low Food Security-Low Food Security 0.8860333
From the results, it is very clear that the greater F value and smaller P value gives us enough confidence to say that the Hours of work changes significantly with the Food Security Score. Now using Post-Hoc test, it is confirmed that, Marginal Food Security-High Food Security, Low Food Security-High Food Security, Very Low Food Security-High Food Security is significantly different from the other. Also, Low Food Security-Marginal Food Security, Very Low Food Security-Marginal Food Security have significant difference but not as strong as the prior ones. These test results help us to say that Hours on Job per week is affected by the Food insecurity that they face.
We wanted to check whether State variable has impact on Food security or not. However, because each State had different number of respondents, it was difficult to make analysis based on the State variable only. For example, California has the highest number of respondents (6975), whereas Maine has the smallest number of respondents (564). More than 10 times difference between these 2 States. In order to compare, we’ve chosen states which has similar number of respondents. Alabama 1207 versus Washington DC 1207, Florida 2738 versus New York 2580, IL 2052 versus PA 1928. Looking at th graphs, Alabama has more food insecurity than DC, Florida and New York has similar food security level as well as Illinois and Pennsylvania.
Reference to household income: 1 LESS THAN $5,000 2 5,000 TO 7,499 3
7,500 TO 9,999 4 10,000 TO 12,499 5 12,500 TO 14,999
6 15,000 TO 19,999 7 20,000 TO 24,999 8 25,000 TO 29,999 9 30,000 TO
34,999 10 35,000 TO 39,999
11 40,000 TO 49,999 12 50,000 TO 59,999 13 60,000 TO 74,999 14 75,000 TO
99,999 15 100,000 TO 149,999 16 150,000 OR MORE
##
## Pearson's Chi-squared test
##
## data: income_t
## X-squared = 9512.9, df = 45, p-value < 2.2e-16
We can say that Household income is affecting food insecurity. When Household income is less than 20000 dollars, it’s more likely to have high food insecurity and, if the Household income is between 20000 and 40000,the families have low food security.
## Warning in chisq.test(family_t): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: family_t
## X-squared = 1691.7, df = 39, p-value < 2.2e-16
We can say that Family size is affecting food insecurity. When family size gets bigger, it’s more likely to have very low food security.
As you can see from the boxplot, whenever family size bigger (more than 6 people), food insecurity is high. Also, household income has direct effect on food security. When household income is higher than 40k, food security score is low.
Interesting thing from this graph is that when family size is bigger, household income is high and that family has high food security. When Family size and Household income are separate, they have significant relationship with food security. However, when they are combined together, the result is different. For further analysis, we need to consider age and employment type of the family members.
The aim of the study was to discuss the food insecurity that people still face in USA. The study was limited to nine variables. But the primary objective of the study was to understand how most important factors can affect the food security. Ethnicity, Citizenship, participation in SNAP, Education level, Hours on work, Household income, Family size have significant relation with the Food Security Score. Out of all the chosen variables, these 7 variables showed a relation to food security. But keeping an open mind to possible errors in survey data, it is better to rule out other variables only after further studies. There are hundreds of variable that still needs to be studied and which should be discussed.
The United Nations sustainable development goals include eradication of hunger. Government should come up with possible solutions. But this doesn’t make us not responsible for eradicating the Food insecurity. It’s everyone’s duty to build a society which is food secure. ***
The results from the study also shows that further analysis can come
up with models to predict the food security score and hence it reveals
the important sectors that officials should focus on in order to
eradicate food insecurity.
The initial results shows that 10% of the population are still food
insecure in the USA, which accounts for about 33 million people. Food
being the vary basic need of any living being, all should have access to
safe and proper food. This signifies the result of the study and the
further need of studies. We hope that high quality data and thorough
analysis will help to eradicate world hunger.
• https://www2.census.gov/programs-surveys/cps/datasets/2021/supp/dec21pub.csv
• https://www.census.gov/data/datasets/time-series/demo/cps/cps-supp_cps-repwgt/cps-food-security.html
• https://www.ers.usda.gov/data-products/food-security-in-the-united-states/
• https://r4ds.had.co.nz/exploratory-data-analysis.html
• https://www.webpages.uidaho.edu/
• http://statseducation.com/Introduction-to-R/modules/graphics/cont/